Goto

Collaborating Authors

 remark 5


Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs

Gottschalk, Hanno, Partow, Emil, Riedlinger, Tobias J.

arXiv.org Artificial Intelligence

This paper provides a proof of the consistency of sparse grid quadrature for numerical integration of high dimensional distributions. In a first step, a transport map is learned that normalizes the distribution to a noise distribution on the unit cube. This step is built on the statistical learning theory of neural ordinary differential equations, which has been established recently. Secondly, the composition of the generative map with the quantity of interest is integrated numerically using the Clenshaw-Curtis sparse grid quadrature. A decomposition of the total numerical error in quadrature error and statistical error is provided. As main result it is proven in the framework of empirical risk minimization that all error terms can be controlled in the sense of PAC (probably approximately correct) learning and with high probability the numerical integral approximates the theoretical value up to an arbitrary small error in the limit where the data set size is growing and the network capacity is increased adaptively.


Partial-differential-algebraic equations of nonlinear dynamics by Physics-Informed Neural-Network: (I) Operator splitting and framework assessment

Vu-Quoc, Loc, Humer, Alexander

arXiv.org Artificial Intelligence

Several forms for constructing novel physics-informed neural-networks (PINN) for the solution of partial-differential-algebraic equations based on derivative operator splitting are proposed, using the nonlinear Kirchhoff rod as a prototype for demonstration. The open-source DeepXDE is likely the most well documented framework with many examples. Yet, we encountered some pathological problems and proposed novel methods to resolve them. Among these novel methods are the PDE forms, which evolve from the lower-level form with fewer unknown dependent variables to higher-level form with more dependent variables, in addition to those from lower-level forms. Traditionally, the highest-level form, the balance-of-momenta form, is the starting point for (hand) deriving the lowest-level form through a tedious (and error prone) process of successive substitutions. The next step in a finite element method is to discretize the lowest-level form upon forming a weak form and linearization with appropriate interpolation functions, followed by their implementation in a code and testing. The time-consuming tedium in all of these steps could be bypassed by applying the proposed novel PINN directly to the highest-level form. We developed a script based on JAX. While our JAX script did not show the pathological problems of DDE-T (DDE with TensorFlow backend), it is slower than DDE-T. That DDE-T itself being more efficient in higher-level form than in lower-level form makes working directly with higher-level form even more attractive in addition to the advantages mentioned further above. Since coming up with an appropriate learning-rate schedule for a good solution is more art than science, we systematically codified in detail our experience running optimization through a normalization/standardization of the network-training process so readers can reproduce our results.


Discretized Gradient Flow for Manifold Learning in the Space of Embeddings

Gold, Dara, Rosenberg, Steven

arXiv.org Artificial Intelligence

Gradient descent, or negative gradient flow, is a standard technique in optimization to find minima of functions. Many implementations of gradient descent rely on discretized versions, i.e., moving in the gradient direction for a set step size, recomputing the gradient, and continuing. In this paper, we present an approach to manifold learning where gradient descent takes place in the infinite dimensional space $\mathcal{E} = {\rm Emb}(M,\mathbb{R}^N)$ of smooth embeddings $\phi$ of a manifold $M$ into $\mathbb{R}^N$. Implementing a discretized version of gradient descent for $P:\mathcal{E}\to {\mathbb R}$, a penalty function that scores an embedding $\phi \in \mathcal{E}$, requires estimating how far we can move in a fixed direction -- the direction of one gradient step -- before leaving the space of smooth embeddings. Our main result is to give an explicit lower bound for this step length in terms of the Riemannian geometry of $\phi(M)$. In particular, we consider the case when the gradient of $P$ is pointwise normal to the embedded manifold $\phi(M)$. We prove this case arises when $P$ is invariant under diffeomorphisms of $M$, a natural condition in manifold learning.


Optimized classification with neural ODEs via separability

Álvarez-López, Antonio, Orive-Illera, Rafael, Zuazua, Enrique

arXiv.org Artificial Intelligence

Classification of $N$ points becomes a simultaneous control problem when viewed through the lens of neural ordinary differential equations (neural ODEs), which represent the time-continuous limit of residual networks. For the narrow model, with one neuron per hidden layer, it has been shown that the task can be achieved using $O(N)$ neurons. In this study, we focus on estimating the number of neurons required for efficient cluster-based classification, particularly in the worst-case scenario where points are independently and uniformly distributed in $[0,1]^d$. Our analysis provides a novel method for quantifying the probability of requiring fewer than $O(N)$ neurons, emphasizing the asymptotic behavior as both $d$ and $N$ increase. Additionally, under the sole assumption that the data are in general position, we propose a new constructive algorithm that simultaneously classifies clusters of $d$ points from any initial configuration, effectively reducing the maximal complexity to $O(N/d)$ neurons.


Mean-field Approximations for Stochastic Population Processes with Heterogeneous Interactions

Sridhar, Anirudh, Kar, Soummya

arXiv.org Artificial Intelligence

This paper studies a general class of stochastic population processes in which agents interact with one another over a network. Agents update their behaviors in a random and decentralized manner according to a policy that depends only on the agent's current state and an estimate of the macroscopic population state, given by a weighted average of the neighboring states. When the number of agents is large and the network is a complete graph (has all-to-all information access), the macroscopic behavior of the population can be well-approximated by a set of deterministic differential equations called a {\it mean-field approximation}. For incomplete networks such characterizations remained previously unclear, i.e., in general whether a suitable mean-field approximation exists for the macroscopic behavior of the population. The paper addresses this gap by establishing a generic theory describing when various mean-field approximations are accurate for \emph{arbitrary} interaction structures. Our results are threefold. Letting $W$ be the matrix describing agent interactions, we first show that a simple mean-field approximation that incorrectly assumes a homogeneous interaction structure is accurate provided $W$ has a large spectral gap. Second, we show that a more complex mean-field approximation which takes into account agent interactions is accurate as long as the Frobenius norm of $W$ is small. Finally, we compare the predictions of the two mean-field approximations through simulations, highlighting cases where using mean-field approximations that assume a homogeneous interaction structure can lead to inaccurate qualitative and quantitative predictions.


The mathematics of adversarial attacks in AI -- Why deep learning is unstable despite the existence of stable neural networks

Bastounis, Alexander, Hansen, Anders C, Vlačić, Verner

arXiv.org Machine Learning

The unprecedented success of deep learning (DL) makes it unchallenged when it comes to classification problems. However, it is well established that the current DL methodology produces universally unstable neural networks (NNs). The instability problem has caused an enormous research effort -- with a vast literature on so-called adversarial attacks -- yet there has been no solution to the problem. Our paper addresses why there has been no solution to the problem, as we prove the following mathematical paradox: any training procedure based on training neural networks for classification problems with a fixed architecture will yield neural networks that are either inaccurate or unstable (if accurate) -- despite the provable existence of both accurate and stable neural networks for the same classification problems. The key is that the stable and accurate neural networks must have variable dimensions depending on the input, in particular, variable dimensions is a necessary condition for stability. Our result points towards the paradox that accurate and stable neural networks exist, however, modern algorithms do not compute them. This yields the question: if the existence of neural networks with desirable properties can be proven, can one also find algorithms that compute them? There are cases in mathematics where provable existence implies computability, but will this be the case for neural networks? The contrary is true, as we demonstrate how neural networks can provably exist as approximate minimisers to standard optimisation problems with standard cost functions, however, no randomised algorithm can compute them with probability better than 1/2.


Couplings for Andersen Dynamics

Bou-Rabee, Nawaf, Eberle, Andreas

arXiv.org Machine Learning

Abstract: Andersen dynamics is a standard method for molecular simulations, and a precursor of the Hamiltonian Monte Carlo algorithm used in MCMC inference. The stochastic process corresponding to Andersen dynamics is a PDMP (piecewise deterministic Markov process) that iterates between Hamiltonian flows and velocity randomizations of randomly selected particles. Both from the viewpoint of molecular dynamics and MCMC inference, a basic question is to understand the convergence to equilibrium of this PDMP particularly in high dimension. Here we present couplings to obtain sharp convergence bounds in the Wasserstein sense that do not require global convexity of the underlying potential energy. October 1, 2020 1. Introduction A common task in molecular dynamics is to simulate a molecular system at a specified temperature [3, 27].


Inertial Block Mirror Descent Method for Non-Convex Non-Smooth Optimization

Hien, Le Thi Khanh, Gillis, Nicolas, Patrinos, Panagiotis

arXiv.org Machine Learning

In this paper, we propose inertial versions of block coordinate descent methods for solving non-convex non-smooth composite optimization problems. We use the general framework of Bregman distance functions to compute the proximal maps. Our method not only allows using two different extrapolation points to evaluate gradients and adding the inertial force, but also takes advantage of randomly picking the block of variables to update. Moreover, our method does not require a restarting step, and as such, it is not a monotonically decreasing method. To prove the convergence of the whole generated sequence to a critical point, we modify the convergence proof recipe of Bolte, Sabach and Teboulle (Proximal alternating linearized minimization for non-convex and non-smooth problems, Math. Prog. 146(1):459--494, 2014), and combine it with auxiliary functions. We deploy the proposed methods to solve non-negative matrix factorization (NMF) problems and show that they compete favourably with the state-of-the-art NMF algorithms.


Local and global asymptotic inference in smoothing spline models

Shang, Zuofeng, Cheng, Guang

arXiv.org Machine Learning

This article studies local and global inference for smoothing spline estimation in a unified asymptotic framework. We first introduce a new technical tool called functional Bahadur representation, which significantly generalizes the traditional Bahadur representation in parametric models, that is, Bahadur [Ann. Inst. Statist. Math. 37 (1966) 577-580]. Equipped with this tool, we develop four interconnected procedures for inference: (i) pointwise confidence interval; (ii) local likelihood ratio testing; (iii) simultaneous confidence band; (iv) global likelihood ratio testing. In particular, our confidence intervals are proved to be asymptotically valid at any point in the support, and they are shorter on average than the Bayesian confidence intervals proposed by Wahba [J. R. Stat. Soc. Ser. B Stat. Methodol. 45 (1983) 133-150] and Nychka [J. Amer. Statist. Assoc. 83 (1988) 1134-1143]. We also discuss a version of the Wilks phenomenon arising from local/global likelihood ratio testing. It is also worth noting that our simultaneous confidence bands are the first ones applicable to general quasi-likelihood models. Furthermore, issues relating to optimality and efficiency are carefully addressed. As a by-product, we discover a surprising relationship between periodic and nonperiodic smoothing splines in terms of inference.